Fast Enumeration of Combinatorial 

Objects 

Boris Ryabko 



Summary . The problem of ranking (or perfect hashing) is well known in 
Combinatorial Analysis, Computer Science, and Information Theory. There 
are widely used methods for ranking permutations of numbers {1, 2, n}, n > 
1, for ranking binary words of length n with a fixed number of ones and 
for many other combinatorial problems. Many of these methods have non- 
exponential memory size and the time of enumeration c\n C2 bit operations 
per letter, where c\ > 0, c% > 1, n — > oo. In this paper we suggest a method 
which also uses non-exponential memory size and has the time of enumera- 
tion 0((logn) const ) bit operations per letter, const > 0, n — > oo. 



Index terms: fast ranking, enumerative encoding, perfect hashing. 

1 Introduction 

The problem of ranking can be described as follows. We have a set of combi- 
natorial objects S, such as, say, the k-subsets of n things, and we can imagine 
that they have been arranged in some list, say lexicographically, and we want 
to have a fast method for obtaining the rank of a given object in the list. 
This problem is widely known in Combinatorial Analysis, Computer Science 
and Information Theory (see [1,2]). Ranking is closely connected with the 
hashing problem, especially with perfect hashing and with generating of ran- 
dom combinatorial objects. In Information Theory the ranking problem is 
closely connected with so-called enumerative encoding [3] , which may be de- 
scribed as follows: there is a set of words S and an enumerative code has to 
one-to-one encode every s G S by a binary word code(s). The length of the 
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code(s) must be the same for all s G S. Clearly, |code(s)| > log|S'|. (Here 
and below logx = log 2 x).) 

The simplest method of coding is to store all words from S and all words 
code(s), s G S, in the memory of the encoder and the decoder. In this case the 
time for encoding and decoding is proportional to log \ S\ and, obviously has a 
minimal value within a multiplicative constant when \S\ grows. However, the 
memory size of the encoder and decoder increases exponentially (as a function 
of the word length) just because they need to store all words s E S and 
code(s),s G S. Fortunately, for many interesting and important problems 
of enumeration there exist methods which do not use exponential memory 
size, see, for example, [1,2]. We consider two examples of such problems: 
enumeration of permutations and enumeration of the set of binary words with 
a given number of ones. These examples are well known in Combinatorial 
Analysis. Note that the second problem is very important for Information 
Theory where it forms the basis for many data compression codes. The 
first code, which does not use exponential memory size, was developed by 
Lynch [4], Davisson [5] and Babkin [6] (see also [2]). For this code the time 
of encoding and decoding per letter is more than const ■ n bit operations. 
This also holds for the time of encoding end decoding for known methods for 
ranking of permutations. 

In this paper we suggest a new method for ranking (or enumerative en- 
coding) for which the time of encoding and decoding is 0(log cons< n) bit 
operations per letter. This method is based on the divide-and-conquer prin- 
ciple and uses the Schonhage-Strassen method of fast multiplication. As 
mentioned above, the proposed method is better than the known ones when 
there exists an algorithm with non-exponential memory size. The suggested 
method allows the exponential growth of the speed of encoding and decoding 
for all combinatorial problems of enumeration which are considered, for ex- 
ample, in [1] and [2] including the enumeration of permutations, compositions 
and others. 

The next part describes the main idea of the proposed method. The 
descriptions of encoding and decoding are given in the parts 3 and 4, respec- 
tively. 
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2 The Main Idea 



The simplest but important example of the problem of ranking (and enu- 
merative encoding) is the problem of integer translation from one radix to 
another. We will use this example to represent the main idea of the proposed 
method. 

Consider the task of translation of an integer from a radix m{m > 2) 
to the binary system. Let there be given an integer XiX 2 ...x n , n > 1, in 
the number system m. A "common" method of translation is based on the 
following equality: 

n 

code(xi...x n ) = Xi°m n ~ % 
i=i 

Instead of this formula we can use the well- known Horner's scheme : 

code{x\...x n ) = (...(iim + x 2 )m + x 3 )m + ...)m + x n (1) 

All calculations are performed in the binary system and as a result the 
code(xi...x n ) is the binary notation of the number x\X 2 ---X n . Let us estimate 
the time required for calculation as in ( 1 ). Here and below the time will be 
measured by the number of operations with single-bit words. 

When calculating (1^ + 12) we obtain a number of length 2[logm] bits, 
and when calculating ((a^m + x 2 )m + x 3 ) , a number 3[logm] bits long and 
so on. When we calculate these values we have at least to look through the 
words of length of 2 [log m~| , 3 [log m] , ...,n[logm~|. So it takes not less than 
cn 2 logm bit operations to calculate code(x\...x n ) by (1). So one can see 
that time per letter is not less than c n log m. 

The main idea of our approach is very simple. First we propose a new 
arrangement of brackets: 

code(xi...x n ) = (...((xiin + x 2 )(m • m) + (x 3 m + x 4 ))((m • m){m • m))+ 
+ ((x 5 m + x 6 )(m • m) + (x 7 m + x 8 )) + ...) 

(2) 

When we use (2) most of the multiplications are carried out with short num- 
bers. So the total time of calculation is small. 

Secondly, we propose to use a fast method of multiplication in (2). We 
will use the Schonhage-Strassen method of multiplication which is the fastest 
one (see [7, 8]). In this method the time T(L) of multiplication of two binary 
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numbers with L digits (and the time of division of a number with 2L digits 
by a number with L digits) is given by 

T(L) = 0(L\ogL\og\ogL), L — > oo (3) 

Let us estimate the time of calculations when (2) is used. Calculation of 
(m-m), (xim+X2), (x 3 m+x 4 ), (x n _im+x n ) takes (n/2) + l multiplications 
of numbers with [logm] digits, calculation of ((m-m) (m-m)), (xim + £ 2 )(m- 
m ) + (x^m + X4), (x n _ 3 m + x n _ 2 )(m • m) + (x n _\m + x n ) takes (ra/4) + 1 
multiplications of numbers with 2 [logm] digits, and so on. Using this and 
the estimate (3) we can see that the time of calculation of code(x\...x n ) by 
(2) is equal to 

0((n/2) (log m log log m log log log m) + 
(n/4)(21og(2ra) loglog(2m) logloglog(2m) + ...)) = 
0(n log 2 n log log n) 

So we can see that the time per letter is equal to O (log 2 n log log n). 

Thus the proposed method takes O (log 2 n log log n) bit operations per 
letter instead of at least n bit operations. 

Note that our scheme is also valid for the task of calculation of values of 
any given polynomial. 

Claim l.Let P(a) = y\a n ~ x + U20 n ~ 2 + ••• + y n be a polynomial and 
yi,U2, ---Un be integers, m = \og(max{\a\, \yi\, |y n |})- The method of cal- 
culation of the value P(a) according to the formula 

P(a) = ((...((yia + y 2 )(a ■ a) + (y 3 a + y 4 ))((a ■ a)(a ■ a)) + 
(y$a + y 6 )(a -a) + (y 7 -a + y 8 ))... 

which uses the Schonhage-Strassen method of multiplication takes not more 
than c • n ■ mlog 2 (n ■ m) loglog(n • m) bit operations when c is constant, 
n — > 00. 

On the other hand, calculation by Horner scheme takes not less than 
const ■ (n 2 ■ m) bit operations. 

The proposed simple idea will be used in this paper for fast ranking and 
enumerative coding for the general case. It is interesting that the method 
of "proper" arrangement of brackets is a special case of divide-and-conquer 
principle (see the definition in [7]). 
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3 Fast Ranking (or Encoding) 



Let m > 2 be an integer, A = {a±, a 2 , a m } the alphabet and A n a set of 
words of length n in the alphabet A, where n > is an integer. Every S C A n 
is called a source. An enumerative code ip is given by two mappings <p c : S — > 
{0, 1}*, where t = \\og\S\] and V d : V C (S) -> 5 , so that </(</? c (s)) = s for 
all s e 5" (here and below, |x| is the cardinality of x if x is a set, and the 
length of x if x is a word). The map ip c is the encoder and the map ip d is 
the decoder. For the sake of simplicity we identify every word with a certain 
number from the interval [0,1]. For example, 0110 = 3/8. Without loss of 
generality it is assumed that the alphabet A is a set of integers from the 
interval [0, m — 1], and we may apply the lexicographic order to A n . 

Let us describe an enumerative code from [3]. Denote by N s {x 1 ...X] t ) 
the number of words which belong to S and have the prefix x 1 ...Xk, k = 
1, 2, n — 1. For x\X2---x n G S define 

n 

code(x 1 ...x n ) = N s (xi...x i - 1 a) (4) 

j=l a<Xi 

It is the code word for x\...x n . It should be noted that there is a lot of 
interesting cases where the formula (4) allows to calculate the code using 
non-exponential memory size. 

We give two examples of coding according to the formula (4). Both are 
taken from [1-3]. 

The first example is the enumeration of binary words with a given number 
of ones. There is a source S generating n-length binary words, n > 0. There 
are r, < r < n ones in each word x. 



It's easy to see that 



N a (x 1 ...x k - 1 0) = ( r ™ E ^i Xt ) (5) 
Using this formula and (4) we obtain 

r 2-,i=i Xi ) 

A time estimation of c n log n log log n (c > 0) bit operations per letter is 
obtained in [2] for the problem of enumeration of binary words with a given 
number of ones. 
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In the second example the enumeration of permutations is used. Let A 
be {1,2, ...,n}. Given xix 2 ...x n and i, 1 < i < n, rj denotes the number of 
integers which, first, are less than Xi , and, second, are situated to the right 
of i. The relation (4) becomes 

n 

code(x 1 ...x n ) =^2n(n - i)\ (7) 

i=i 

Using Horner's scheme we obtain 

code(x l ...x n ) = (...(ri(n - 1) + r 2 )(n - 2) + r 3 )...) (8) 

It is easy to estimate the time of calculation by (8) which is not less than 
c n 2 bit operations, where c > is constant. So the time per letter equals 
c n. 

In order to describe the proposed method we consider a source S C 
A n ,n > 1 and a word rci...x n G 5. 
Let us define 

P(xx) = N(x 1 )/\S\,P(x k /x 1 ...x k _ 1 ) = N(x 1 ...x k )/N(x 1 ...x k _ 1 ) 1 
= Ea<xx ^(a), q(x k /x 1 ...x k ) = E a <x k P{a/x 1 ...x k _ 1 ), k = 2, n J 

(9) 

Clearly, 

Z^j=l L-ia<Xi 

N( Xl . ) = |S|(g(xi) + g(a;2/a;i)P(a;i) 
+g(x 3 /xiX 2 )P(a;2M)-P(a;i) + q(x4/x 1 X2X 3 )P(x 3 /x 1 x 2 P(x2/x 1 )P( y x 1 ) + ...) 

From this equality and (4) we obtain 

code(x 1 ...x n ) = \S\(q( Xl ) + q(x 2 /x 1 )P(x 1 ) + q(x 3 /xiX 2 )P(x 2 /x 1 )P(x 1 ) + ...) 

(10) 

In short, the proposed method may be described as follows: first, use the 
proper arrangement of brackets in (10) and, second, carry out all calculations 
using rational numbers. For the sake of simplicity we assume that logn 
is an integer. (In general case we can add, for example, the letters to 
every word from S in order to make logn an integer. It does not affect \S\ 
and the complexity of the code.) The formal implementation of the proper 
arrangement of brackets is: 

pQ = P( Xl ),p% = P(x 2 /x 1 ), ...,p° = P(x n /xix 2 ...x n -i) 
A° = q( Xl ), \° 2 = q(x 2 / Xl ), \° n = q(x n /x 1 ...x n _ 1 ) 
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Pi = Plk-i ■ Plk 1 , s = 1 , 2, . . . , log n; fc = 1 , 2, . . . , n/2 s 
Af = Af fc -_\ + p 3 ^ 1 ■ Af fc "\ s = 1,2, ...,\ogn;k — 1,2, .., 



(12) 



All calculations are carried out using rational numbers and all p| and 
A| are fractions and presented as pairs of integers. The Shonhage-Strassen 
method is used for multiplications. 

As a result we have 

\ l ° gn = (q(xi) +q(x 2 /x 1 )P(x 1 )) + (q(x 3 /x 1 x 2 ) + 
q(x 4 /x 1 ...x 3 )P(x 3 /x 1 x 2 )) ■ (P(x 1 )P(x 2 /x 1 )) + ... 

We define the proposed code a c as follows: 



a c < 



\x 1 ...x n ) = \S\-\ l r gn (13) 
Now let us consider some examples. 

First, we consider the ranking of binary words with a given number of 
ones. Recall that 

(t\ = (t-l\t_ (t\ = (t-l\_i_ 

\p J yp- 1 J p' \p J V p J f -P 

Let A be or 1. Combining the last equalities, we obtain 
( *-^)/( * ^ _ A-p+(l-A)(t-p) 

This equality and (9), (5) yield 

xt{k - £5=i Xj) + (1 - x t )(n - t + 1 - (k - E*=i Xj)) 



P(x t /xi...x t -i) 



n-t + 1 

(14) 



q(x t /x 1 ...x t -i) = _ , (15) 

t = l,2,...,n. 

Let us give an example. Let n = 8, k = 3 and the word Xix 2 ...x s = 
01000101. From (14), (15) and (11), (12) we obtain 

p{xi) = m = 0(3-0) + (l-0)(8-l + l-(3-0)) = 5/g 



*»M = KVO) = 1 (3-0) + ( 1 -l)(8-l + l-(3-0)) = 3/? 

Kx,/^) = rfO/Ol) = 0(3-l) + (l-0)(8-3 + l-(3-l)) = 4/6 

P( , 4/ , 1I2I3) = 0(3-l) + (l-0)(8-4 + l-(3-l)) = 3/5 

x 0(3 - 1) + (1 - 0)(8 - 5 + 1 - (3 - 1)) 
^(^5/0:1X2X3X4) = p(0/0100) = — — J \ ; = 2/4 

p(x 6 /xi...x 5 ) =p(l/01000) 







3 — 5 + 1 






1(3 


-l) + 0(! 


3-6+1 


"(3-1)) 


= 2/3 




8 


-6 + 1 




0(3- 


2) + (1 - 


0)(8-7- 


f 1 - (3 - 


2)) 




t 


5-7+1 






_ 1(3- 


-2) + (l- 


-l)(8-8 


■ + 1 - (3 - 


-2)) _ 



p(x 7 /x 1 ...x 6 ) =p(0/010001) = ^ ' v '\ , , = 1/2 



" " w 7 8-8+1 

q{ Xl ) = 9(0) = 0; 9 = (x 2 /x x ) = g(l/0) = 1(8 ~ 2 g ± \~ + f - ° )} = 4/7 

g(x 3 /xix 2 ) = g(x 4 /xi...x 3 ) = g(x 5 ...) = 

, , N , , v 1(8 -6 + 1 -(3-1)) 
q(x 6 /...) = 9(1/01000) = 8 _ 6 + 1 = I/ 3 

g(x 7 /...) = 0, g(x 8 /...) = g(l/0100010) = ^|±iz^zj)) = 



p o = 5/8) p o = 3/7) p o = 4/6) p o = 3/5) p o = 2/4; p o = 2/3) p o = 1/2) p o = 1 

A? = 0, \° 2 = 4/7, A° = 0, A° = 0, A° = 0, A° = 1/6, A° 7 = 0, A° = 
pi = 5/8 • 3/7, p\ = 4/6 • 3/5, p\ = 2/4 • 2/3, p\ = 1/2 • 1/2 • 1 
A} = + 5/8-4/7, \\ = + 0, A* = + 1/3-2/4, A* = + -1/3-2/4, A* = + 
p\ = 5/8 • 3/7 • 4/3 • 3/5; p\ = 2/4 • 2/3 • 1/2 • 1/1 
\\ = 5/8 • 4/7 + 5/8 • 3/7 • = 5/8 • 4/7 
\j = 1/3 • 2/4 + 2/4 • 2/3 • = 1/3 • 2/3 
\l = 5/8 • 4/7 + 5/8 • 3/7 • 4/6 • 3/5 • 1/3 • 2/4 = 20/56 + 1/56 = 21/56 
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Observe that there are y ^ J = 56 binary words of the length 8 with 3 
ones. Thus, from (13) we obtain a code word: 

a c (01000101) = 56 • (21/56) = 21 

Of course, calculations according to the formula (6) give the same result: 
cocie(01000101) = 21. (For the sake of clearness we carry out all calculations 
with decimal numbers instead of binary ones). 

Let us consider the enumeration of permutations. From the definition we 
obtain N s (xi...Xk) = (n — k)\; see also (7). The equalities (9), (11)-(13) yield 

a c (xix 2 ...x n ) = 

n\((— + r 2 \ ] ( l . 1 \f r 3 r 4 x 

n-(n — l)' n — l'^n — 2 (n — 2)(n — 3' 

+ ' n- lHn-2 ' n-3^^n-4 + (n-2)(n-5)) + 

+ (J^- + 5 U 

V( n _4) ( n _5)A n _6 (n-6)(n-7)' / 

In order to estimate the complexity of the method a c we define several val- 
ues. Let, as before, S C A n be given. By definition T is the maximal time (in 
bit operations) for calculation of rational fractions N(xi...x t +i)/N(xi...x t ), 
where x\...x n e S, t — 1,2, ...,n — 1, M is the size (in bits) of the program 
that is used to compute 

{N(x 1 ...x t -i)/N(x 1 ...x t ); xi...x n e S, t = 1, 2, ...n - 1} 

and let Q be the maximal denominator of rational fractions 

N(x 1 ...x t+1 )/N(x 1 ...x t ),x 1 ...x n e S, t = 1, n. 

Let us define 

Q = max{\A\,Q} (16) 

Theorem 1. Let there be given an alphabet A, an integer n and S C A n . 
The proposed method of encoding a c has the following properties: 

i) a c is correct, i.e. for every x,y G S a c (x) ^ Oi c {y) and a c (x) is an 
integer from [0, \S\ — 1] 
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ii) the time of encoding per letter is 

T + 0(logn logQlog(n logQ) loglog(n logQ)) 
bit operations 

Hi) the memory size of the encoder is M + 0(n log Q log n) bits. 

Proof. The claim i) immediately follows from (10)- (13). 

For the sake of simplicity of the proof of ii) we assume that logn and 
logQ are integers. According to the definition of Q and (14), (15) we can 
see that the notation of every P{ ) and q( ) uses 2 logQ bits (logQ bits for 
the numerator and logQ bits for the denominator). That is why the calcu- 
lation of p\,k = l,2,...,n/2 according to (12) takes 2(n/2) multiplications 
of numbers of the length logQ bits and the calculation of A^, k — 1, ...,n/2 
according to (12) and the formula a/b + c/d = (ad + bc)/(bd) takes 3(n/2) 
multiplications of numbers with the length logQ bits. The calculations of 
p\,\\,k = 1,2, ...,n/4 take 5(n/4) multiplications of numbers of the length 
2 logQ bits each. Similarly, the calculation of p\,\ % k) k = 1,2, ...,n/2 t takes 
5(n/2 l ) multiplications of numbers with the length 2 % logn bits. From (3) we 
obtain that the general time of calculations is: 

(5n/2)0(logQ loglogQ logloglogQ)+ 

(5ra/4)0(2 logQ log(2 logQ) loglog(2 logQ) + ... 
(5n/2 i )0(2 l logQ log(2MogQ) log log(2 i logQ) + 
... + 5-0(n logQ log(n logQ) log log(n logQ) 
It is easy to see that the last value is not more than 

0(n logn logQ) log(n logQ) log log(n logQ)) 

bit operations. It yields 

0(logn logQ log(n logQ) log log(n logQ)) (17) 

bit operations per letter for calculation of A 1 ^™. In order to obtain A c (a;i...x n ) 
we should calculate the product |S , |A 1 1 ogn , see (13). S is a subset of A n , so 
\S\ < \A\ n and a binary notation of the numbers \S\ and A 1 ^™ takes not 
more than n • log \A\ bits. From (3) we can see that the time of calculation 
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of \S\Xi is equal to 0(n log \A\ log(n \A\) loglog(n \A\) bit operations per 
letter. From this, (17), and (16) we obtain ii). 

In order to estimate the size of the encoder program, note that when it 
calculates X\, p\ it can store only A^T 1 , PjT 1 , i — 2, logra, the same memory 
is used to store {A^T 1 , p^T 1 ; k = 1, n/2* -1 } and {\i,p{;k = l,...,n/2 i }. 
From this and the definitions of m and Q we can easily obtain iii). Theorem 
1 is proved. 

4 Fast Decoding 

First, we describe the general scheme of decoding not taking into account 
the time of calculation. Let an alphabet A = {0, 1, ...,m — 1} and a source 

5 C A n be given and let x = X\X 2 ...x n be a word from S and y = a c (x) be 
the encoded word x. 

In order to decode y we consider y\ — yj\S\ as a rational number and 
first find i\ satisfying the inequalities 

A° < m < A? 1+1 (18) 

From these inequalities it follows that the first letter of the encoded word 
is i\. X\ — i\. After that we calculate 

z = (Vi ~ K)/Pl 

where z is a rational number, and find i 2 complying with the inequalities 

A° 2 < z < A° +1 (19) 

If follows that the second letter is i 2 . 

Of course, we could use this way to find the third letter, then the fourth 
one, etc. But we use a more complicated way which will give a possibility 
to operate with short numbers. We calculate \\ according to (12). (It is 
possible because now x\ and x 2 are known now.) After that we calculate 

V2 = (yi - >$/p\ (20) 

and find letters £3,2:4 using y 2 as we have found x±,x 2 using y\. Then we 
calculate A2 using x 3 and £4 and A^ using X\, \l and p\, see (12). And so on. 
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The point is that when we carry out calculations (18)-(20) we can use only 
estimations of yi, 1/2, \j, p}, etc, which are based on the few leading digits. 
More exactly, we will use two estimations for every value which are an upper 
bound and a lower one. 

In order to give the exact definition, first we define several auxiliary val- 
ues. Let p/q be a rational number represented as a pair of the integers p, q, 
< p < q, and let t > 1 be an integer. We define two functions <pt(p/q) an d 
(Ptiplo) as follows. Let I = [loggj, and (?j?j_i...?o) an d (pi---Po) be binary 
representations of q and p, correspondingly. Then 

<pT(p/q) = ( E + E ^) 

i=l-t+l i=l-t+l 

<P7(P/Q) = ( E Pi2*)/( E ^ + 2'-*) 

i=Z-t+l i=J-t+l 

For example, ^(5/17) = 3/8, ^3 (5/17) = 2/9. 
We will need the following simple bounds. 
Lemma. Let p, q, t be integers , < p < q, t > 2. Then 

o< v t(p/q)-p/q<^ 2 ~ t (21) 

0<p/q- V ;(p/q)<2 2 - t (22) 
Proof. It's easy to see that if x < 1/2 then 

11 

< 1 + 2x, > 1 - x 23 

1 -x 1 + x 

Theae bounds immediately follow from well known equalities 

(1 - x)~ l = l + x + x 2 + ... = l + x + x 2 /(l -x) = l + x(l + x/(l - x)) 

(1 + x y l = l- x + x 2 -... = l- x + x 2 (l -x + x 2 - ...) 
The following sequence of inequalities gives the bound (21): 

Vt(p/q) < ^5 < PM 1 + 2 '"Vp)(1 + 2 • 2'-'/?) = 
p/g + 2 ,_ 7g + 2 • 2 ,_ */g + 2 ,_ 7g • 2 /g < p/q + 2~*+ 
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2-2"* + 2' 2t <p/q + 4-2^ 

Here we use (23) and the obvious inequality 2 l < q. 

Let us proceed with the description of the method of decoding. Let, as 
before, an alphabet A — {0, 1, m — 1} and a source S C A n be given. 

As before, let Q be the maximal denominator of the rational numbers 
N(xi...x t +i)/N(xi...x t ), x\...Xt+i € S, t — 1, 2, n — 1. From this definition 
and (11), (12) we immediately obtain that the denominators of the rational 
fractions p\ and A^ not exceeding Q s , for all s = 1, v; k — 1, ...,n/2 s . Let 

h=\\ogQ]+3 (24) 

We will give the definition by induction on n. First, let n = 2. For every 
value A* we define the upper and the lower estimations, and 
Let the decoder calculate 

A + (l, 1) = <p&(y/\S\), A"(l, 1) = tpn(y/\S\) (25) 
A+(0, 1) = v^(A+(l, 1)), A"(0, 1) = ^((A-(l, 1)) (26) 
Then it finds i\ complying with the inequalities 

g(*i)<A + (0,l),g( ?1 + l)>A-(0,l) (27) 

We use these inequalities instead of (23). But here the decoder carries out 
calculations with (h + 3) - length words instead of the whole binary notations 
of y and \S\. The inequalities (27) mean that the first letter of coded words 
is i\. x\— ii.Let us define 

A+(0,2) = v^((A+(l,l)-A?)/p?) 

A-(0,2) = ^((A-(l,l)-A?)/p?) [Z *> 

and find i 2 complying with inequalities 

q(t 2 )<\ + (0,2), g(i 2 + l)>A-(0,2) 

It means that the second letter is i 2 : x 2 = i 2 - 

Let now n be greater than 2. In order to use the divide-and-conquer 
principle we define 

A+(flognl,l)=^(i//|5|) 
A-(flognl,l)=^(y/|5|) 
A+([lognl -1,1) =V^ V2l (Al(Rognl,l)) ^ 

A-(flognl-l,l) = ^ Bfc/J1 (A(flognl,l)) 
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Then the decoder finds x\ , x 2 , . . . , x \ n /2\ using A + ( [log n\ — 1 , 1) , A ( [log n\ — 
1, 1) and calculates A[ logn ^ _1 (see (11), (12)). After that the decoder calcu- 
lates 

A+([logn] - 1,2) = v^ /2l ((A + ([logn], 1) - Af 08 " 1 " 1 )/^" 1 " 1 ) 
A-(flognl -l > 2)=^((A-(flognl,l)-A^-V^ Bl -^ 1 > 

and uses this pair in order to find xp n /2]+i, ^[n/21+2, •••^n- So (29) and (30) 
give a possibility to decode the n- letter word as two words of length [n/2] 
and (n — [n/2]), correspondingly. 

In order to give an example of the decoding, let us consider the previous 
example. Let, as before, S be a set of all binary words of length 8 and each 
of them has 3 ones. Let the proposed method be applied for decoding of the 
word 010101 = (21)io. 

According to the description, first, the decoder finds 

A+([logn], 1) = A+(3, 1) = y?+(25/56) = (21 • 2 42 + 1/56 • 2 42 ) 
A-( [logn] , 1) = A" (3, 1) = ^(25/56) = (21 • 2 42 /56 • 2 42 + 1) 

and 

A+(2, 1) = (p+(21 ■ 2 42 + 1/56 • 2 42 ) = (21 • 2 23 + 1/56 • 2 23 ) 

A~(2,l) = (21 • 2 23 /56 • 2 23 + 1) 

Using A + (2, 1) and A" (2, 1) the decoder should find x 1: x 2 , x 3 , x 4 . Accord- 
ing to the algorithm, it calculates 

A+(l, 1) = y?+ (21 • 2 23 + 1/56 • 2 23 ) = (21 • 2 7 + 1/56 • 2 7 ) 
A"(l,l) = (21 • 2 7 /56 • 2 7 + 1) 

This pair encodes x 1: x 2 . After that the decoder finds 

A+(0, 1) = ^(21 • 2 7 + 1/56 • 2 7 ) = 22/56 

A~(0, 1) = (21 • 2 7 + 1/56 • 2 7 ) = 21/57 

which encode x\. For given S q(0) = 0,g(l) = 5/8 (see the example of 
coding), xi — because 

= q(i 1 ) < A + (0, 1) =22/56 
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21/57 = A _ (0,1) < q(h + 1) = 1 
(see (22)). According to (24) the decoder calculates 

A + (0, 2) = y4((21 • 2 6 + 1/56 • 2 7 - 0)/(5/8) = 169/280 

A~(0,2) = 168/281 
Thus, x 2 — 1 because g(0/0) = 0,g(l/0) = 4/7 and, obviously, 

4/7 < 169/280 = A + (0, 2), A"(0, 2) = 168/281 < 1. 

After finding x\ — and x 2 = 1 the decoder calculates A} = 5/14, = 15/56 
and finds A + (l, 2), A~(l, 2) according to (24). It gives x 3 = 0,^4 = and so 
on. 

The next theorem characterizes properties of the proposed method of 
decoding which we denote as a d . 

Theorem 2. Let there be an alphabet A, an integer n and S C A n . Then 
the proposed method of decoding a d has the following properties: 

i) a d is correct, i.e. for every x G S 

a d (a c (x)) = x 

ii) the time of decoding per letter is 

T + 0(logg(logn log(n Q))loglog(n Q) 
Hi) the memory size of the decoder is 

M + 0(n log Q log n) 
(here T, M and Q are defined as in Theorem 1.) 

Proof. First, we estimate the speed of decoding. As it follows from the 
algorithm every operation of multiplication for calculation of A* corresponds 
to two divisions when the decoder calculates and X~(i,j) according 

to (28)-(30). The time of divisions in (28)-(30) is proportional to the time of 
multiplications, see (12). So the time of calculations of \ + (i,j) and 
is equal to the time of calculations of A* within a multiplicative constant. It 
is easy to see that the time of finding %\ = x±, %i = X2---, according to (27) 
does not change the asymptotical estimation of the time of decoding. 
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Let us estimate the memory size. For this purpose we note that the 
decoder can use the same memory size for decoding the first letters x\...x^ n /2\ 
and the letters X|- n / 2 ]+i, x n . From this fact it immediately follows that the 
decoder can use such a memory size as does the encoder and it gives the 
same estimation for the memory size. 

Now we will show that a d is a correct method of decoding. For this 
purpose for a letter Xj, j — 1, ...n, we estimate the values 

|A+(0,j)- q(xj)\ and \q( Xj ) - A~(0, j)\ 

It is important because the decoder decides that letter Xj sould be decoded 
as ij if the inequalities 

q(ij)<\ + (0,j), q(ij + l)>\-(0,j) (31) 

are valid. As it follows from (11), the (31) are equal to 

A°. <X + (0,j), A° +1 > A-(0,j) (32) 

and these inequalities should be valid for one ij in the case of Xj = ij. And 
this property should be valid for all letters x±, x 2 , x n and for all x 1 ...x n G S. 
By definition Q is the maximal denominator of the rational fractions 

P(x t +i/xi...x t ) = N(x 1 ...x t+ i)/N(x 1 ...x t ) 

t — 1, ...,n — 1; Xi...x n G S. It means that for every Xi...x t , i, j G A; i ^ j. 

\q(i/xi...x t ) - q(j/x!...x t )\ > 1/Q , . 

?(</xi...x t )>l/Q; 1 j 

see (9). From the definition (11) we obtain 

|A°-A°|>1/Q 

From this inequality and (32) we can see that the inequalities 

0<A+(0,j)-A°<l/Q 
0<A°-A-(0,j)<l/Q 

guarantee the correctness of decoding. We will prove only the first pair of 
inequalities because the second one can be proved in the same way. We will 
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investigate the value A + (0, n) — A° because it can be easily seen from a proof 
that the possible error is maximal for the last letter x n . First, we notice that 
the inequality 

A + (0,j)-A°>0 

is immediately obtained from the definition of (p + ( ) and A + ( ), see (25)- (30). 
Now we have to prove that 

X + (0,n)-X° n <1/Q (34) 

We define 

e(i,j) = - (A + (* + l,j/2) - \U)IpU) ( 35 ) 

% — 0, = n/2 l . In fact, e(i,j) is an error arising from using <pJ h ((A + (i + 
1, j/2) - X^p)^) instead of the value (A+(i + 1, j/2) - X^/p)^. The 
following train of expressions is valid: 

A+(0,n) = ^((A + (l,n/2)-A T ° t _ 1 )/p°„ 1 = e(0, n) + (A + (l, n/2)-A°_ 1 )/p°_ 1 = 

e(0,n) + (<p+ ((A + (2,n/4) - A^O/pi/a-i) - A°_0/p°_i = 
e(0, n) + ((e(l,n/2) + (A+(2,n/4) - A^J/p^) - A^J/p^ = 
= e(0,n) +e(l,n/2)/p°_ 1 + ((y?+ (A + (3,n/8) - A T 2 t/4 _i)/p'/4-i " A n/ 2 V 
Pn/2-1) " At^ptx = 5(0, n) + e(l, n/2)/p° n ^ + e(2, n/4)/(p°_ 1 • p) l/2 ) + 

(•••(y4(A + (4,n/2VA n/8 -i)/Pn/8-^^ 

= ... = 5(0, n) + 6(1, nl2)lpl_ x + 5(2, n/4)/(p°_ 1 • p^) + 5(3, n/8)/ 

(p°-i • Pn/2-1 • Pn/4-i) + + e(logn, l)/(^ • pVi-P 1 i° S ^ 1 ) + 
+ ((... (Ai ogn - xY n - v )l P Y n - x - A 3 og - 2 )/p 3 ogn ~ 2 - A7 0gn_3 )/P7 0gn " 3 - ...)/p°_! 
So we obtain 

logn i=l 

A + (0 lJ i) = E^n-2l/II^M+ 

j=0 jr'=0 

+(...(Ai os " - Af 8 "- 1 )/^"" 1 - A 3 ogn ~ 2 )/p 3 ogn ~ 2 - ...)/£-! 
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Using the definition (12) of A* we obtain 

logn i— 1 

A + (0, n) = J2 £ (h n2~ i )/ Y[ fj n2 -i_ x + A° 

i=0 i=0 

Thus, 

logn i— 1 

A + (0,n)-A£=£ e (i,n2-*)/nA£2-*-i 

i=0 i=0 

This equality and (33), (12) yield 

A+(0, n) - A° < e(0, n) + e(l, n/2)Q + 5(2, n/4)Q 3 + e(3, n/8)Q 7 + 

...+e(logn,l)g 2n - 1 
The claim of the Lemma, (28), (29) and (35) yield 

e(i,j)<2*-* h 

The last two inequalities and (24) give us 

A + (0,n) - X° n < 4(1/8Q + Q/(8Q) 2 + Q 3 /(8Q) 4 + ...) 

Hence, we obtain the inequality 

A + (0,n)-A° <1/Q 

which completes the proof of (34). Theorem 2 is proved. 



18 



References 



1. Reingold E.M., Nievergelt J., Deo N., " Combinatorial Algorithms. The- 
ory and Practice". Prentice-Hall, Inc., 1977. 

2. Krichevsky R., "Universal Compression and Retrieval", Kluwer Aca- 
demic Publishers, 1994. 

3. Cover T.M. "Enumerative Source Encoding". IEEE Trans. Inform 
Theory, vol. IT-19, pp. 73-77, Juan. 1973. 

4. Lynch T. Y. Sequence time coding for data compression./ / Proc. IEEE, 
v.54, pp.1490-1491, 1966. 

5. Davisson L.D. Comments on "Sequence time coding for data compres- 
sion".// Proc. IEEE, v.54, p.2010, 1966. 

6. Babkin V.F. "A method of universal coding with non-exponent labour 
consumption" Probl. Inform. Transmission, v. 7, pp. 13-21, 1971. 

7. Aho A. V., Hopcroft L.E., Ullman J.D. "The Design and Analysis of 
Computer Algorithms" . Addison- Wesley. Publishing Company,1976. 

8. Knuth D.E. "The art of computer programming." Vol.2. Addison Wes- 
ley, 1981. 



19 



